What is whatwg-encoding?
The whatwg-encoding npm package is a JavaScript implementation of the WHATWG Encoding Standard. It allows for encoding and decoding of text using various character encodings as specified by the standard.
What are whatwg-encoding's main functionalities?
Text Decoding
This feature allows you to decode a Uint8Array of text data into a string using a specified encoding. In the code sample, a Uint8Array representing the UTF-8 encoded string 'Hello' is decoded back into a JavaScript string.
"use strict";
const { decode } = require('whatwg-encoding');
const uint8Array = new Uint8Array([0x48, 0x65, 0x6c, 0x6c, 0x6f]);
const text = decode(uint8Array, 'utf-8');
console.log(text); // 'Hello'
Text Encoding
This feature allows you to encode a JavaScript string into a Uint8Array using a specified encoding. In the code sample, the string 'Hello' is encoded into a Uint8Array using UTF-8 encoding.
"use strict";
const { encode } = require('whatwg-encoding');
const text = 'Hello';
const uint8Array = encode(text, 'utf-8');
console.log(uint8Array); // Uint8Array showing the bytes of the encoded string
Other packages similar to whatwg-encoding
iconv-lite
iconv-lite is a pure JavaScript implementation of character encoding conversion. It supports a wide range of encodings and is often used for encoding/decoding text in different character sets. Compared to whatwg-encoding, iconv-lite may offer a broader range of encodings and is widely adopted in the community.
text-encoding
text-encoding is another implementation of the Encoding Living Standard API for the web, which provides TextEncoder and TextDecoder APIs. It is similar to whatwg-encoding but may have differences in API design and supported features.
Decode According to the WHATWG Encoding Standard
This package provides a thin layer on top of iconv-lite which makes it expose some of the same primitives as the Encoding Standard.
const whatwgEncoding = require("whatwg-encoding");
console.assert(whatwgEncoding.labelToName("latin1") === "windows-1252");
console.assert(whatwgEncoding.labelToName(" CYRILLic ") === "ISO-8859-5");
console.assert(whatwgEncoding.isSupported("IBM866") === true);
console.assert(whatwgEncoding.isSupported("UTF-32") === false);
console.assert(whatwgEncoding.isSupported("x-mac-cyrillic") === false);
console.assert(whatwgEncoding.getBOMEncoding(new Uint8Array([0xFE, 0xFF])) === "UTF-16BE");
console.assert(whatwgEncoding.getBOMEncoding(new Uint8Array([0x48, 0x69])) === null);
console.assert(whatwgEncoding.decode(new Uint8Array([0x48, 0x69]), "UTF-8") === "Hi");
API
decode(uint8Array, fallbackEncodingName)
: performs the decode algorithm (in which any BOM will override the passed fallback encoding), and returns the resulting stringlabelToName(label)
: performs the get an encoding algorithm and returns the resulting encoding's name, or null
for failureisSupported(name)
: returns whether the encoding is one of the encodings of the Encoding Standard, and is an encoding that this package can decode (via iconv-lite)getBOMEncoding(uint8Array)
: sniffs the first 2–3 bytes of the supplied Uint8Array
, returning one of the encoding names "UTF-8"
, "UTF-16LE"
, or "UTF-16BE"
if the appropriate BOM is present, or null
if no BOM is present
Unsupported encodings
Since we rely on iconv-lite, we are limited to support only the encodings that they support. Currently we are missing support for:
- ISO-2022-JP
- ISO-8859-8-I
- replacement
- x-mac-cyrillic
- x-user-defined
Passing these encoding names will return false
when calling isSupported
, and passing any of the possible labels for these encodings to labelToName
will return null
.
Credits
This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.
Alternatives
If you are looking for a JavaScript implementation of the Encoding Standard's TextEncoder
and TextDecoder
APIs, you'll want @inexorabletash's text-encoding package. Node.js also has them built-in.